Accelerated Sampling of Boltzmann distributions 
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The sampling of Boltzmann distributions by stochastic Markov processes, can be strongly limited by the 
crossing time of high (free) energy barriers. As a result, the system may stay trapped in metastable states, and 
the relaxation time to the equilibrium Boltzmann distribution may be very large compared to the available com- 
putational time. In this paper, we show how, by a simple modification of the Hamiltonian, one can dramatically 
decrease the relaxation time of the system, while retaining the same equilibrium distribution. The method is 
illustrated on the case of the one-dimensional double-well potential. 



The sampling of complex and rugged landscapes is a prob- 
lem of major importance in many different fields of Science, 
such as physics, biology, optimization, probability theory, etc 
(2. In statistical physics, simulation methods rely on an effi- 
cient sampling of the important regions of phase space. The 
quality of the sampling guarantees a reliable computation of 
observables. 

There are many Markov processes which can in principle 
sample Boltzmann distributions. Among the most used are 
Monte Carlo and Molecular dynamics methods, the Langevin 
equation Q, etc. combined with simulated annealing or 
quantum annealing (4]. These processes mimick the dynami- 
cal sampling of the phase space, and in many cases, the micro- 
scopic time (time step) must be chosen very small compared 
to the relaxation time in order for the system to indeed sam- 
ple the Boltzmann distribution and reach its thermodynamic 
equilibrium. A typical example of this difficulty is the famous 
"Protein Folding" problem |5|. In this case, given a micro- 
scopic Hamiltonian for the protein, one tries to find the room 
temperature conformation of the molecule by running one of 
the above mentioned algorithms. However, the microscopic 
time (discretization time) is typically of the order of 10 -15 s 
while the typical folding (relaxation) time runs from millisec- 
onds to seconds. The longest simulations available as of now 
for very short proteins run in the 10~ 7 s range, still very far 
from the equilibration time. The same difficulty occurs in 
many disordered systems (spin-glasses), in amorphous sys- 
tems, glasses, polymers, etc. It is thus very important to find 
a way to dramatically accelerate the sampling, so as to make 
it efficient with present days computers. 

In this paper, we assume that the system is subject to an 
overdamped Langevin dynamics (also called Brownian Dy- 
namics). We show how by simply modifying the associated 
Schrodinger equation, one can reduce very substantially the 
relaxation time to equilibrium. 

For the sake of simplicity, we will specialize to the case of 
a one-dimensional particle in a potential well U(x). All the 
following can be trivially extended to the case of any number 
of particles in any dimensions. 
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where D is the diffusion constant of the particle, kg is the 
Boltzmann constant, T is the temperature, and r|(f) is a white 
Gaussian noise such that 
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This Langevin equation describes a Markov process, and 
its probability distribution function (pdf) P(x,t) satisfies the 
Fokker-Planck equation 
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where p = 1 /ksT. The function defined by 

l F(x,f) = e p£/W/2 P(x,f) 
satisfies a Schrodinger equation J6) 
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The solution of eq. (|6]i can be formally expanded as 
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where and E a are the normalized eigenstates and eigen- 
values of H 



Consider a particle subject to the overdamped Langevin dy- 
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and the c a represent the initial condition on the wavefunction 



c a = J dx^ a (x)^(x,0) 
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Note that since H is a Hermitian operator, the eigenstates 
are orthogonal with each other. 

It is well known that the ground state of the Hamiltonian 
|7]l is given by 
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where Zq is the partition function of the original system 

Z = fdxe-V u ® (13) 



Indeed, it is easily checked that 
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which means that is an eigenstate of H with zero eigen- 
value. In addition, since is strictly positive, it is neces- 
sarily the ground state of H, and thus all other eigenvalues are 
positive E a > 0. 

We can thus expand W(x, t) on the basis of eigenstates as 
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and we see that the relaxation time to the Boltzman distribu- 
tion is given by 
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where E\ is the smallest non zero eigenvalue. Note that the 
smallest the eigenvalue, the largest the relaxation time. 

It is well-known that if there are high energy barriers, the 
gap between the lowest energy states is exponentially small 
in the barrier height. For instance, in the double-well poten- 
tial, the energy splitting between the two lowest eigenvalues 
is known to be exponentially small in the barrier height. We 
see therefore that large energy barriers in phase space imply 
a very small energy gap and thus very long relaxation times. 
Thus any dynamics (Langevin, Monte Carlo, Molecular Dy- 
namics) that mimicks the real dynamics of the system will be 
subject to very long relaxation times and very slow relaxation 
rates. 

In order to cure the problem of the small gap, we transform 
the Hamiltonian H into a new Hamiltonian Hx which has ex- 
actly the same eigenfunctions as H but a large gap. We are 
able to do that thanks to the fact that we know exactly the 
ground state of the Hamiltonian. 

Consider the Hamiltonian operator 
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where Po is the projector onto the ground state 
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and X > is an arbitrary constant. The eigenstates and eigen- 
values of Hx are the same as those of H, except for the ground 
state energy which is shifted by A,. Indeed, we have 

H X \V ) = -X\V ) (20) 
Hx\^a} = E a \Va) for <X > 1 (21) 

and thus the new energy gap, which determines the relaxation 
time and rate is given by 



A = Ei+l 
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and can be made very large by increasing A.. 

The price to pay for increasing the gap is that the new 
Hamiltonian Hx is no more local. Also, it cannot be de- 
rived from a Langevin equation through the process described 
above. However, as we now show, it can be easily sampled, 
due to its very simple structure. 

In order to sample ^Po, one may use quantum Monte Carlo 
methods. In the following, we use the Feynman path integral 
representation. Consider the wavefunction 



(23) 



where <t>o is the initial pdf. At large time, t >> 1/A, we have 
*(*, t) ~ e-W W/ 2 c + e -A "Fi {x)a + ... (24) 

and thus the system relaxes to a Boltzmann distribution at tem- 
perature 2T with a smaller relaxation time given by 
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Using a Trotter-like approach, we discretize the exponential 



evolution operator in ( 23 1 
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where e is a small timestep and the exponential factorizes ex- 
actly because H commutes with the projector Po- Since Pq is 
a projector, we have 
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and thus 
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Using the standard expression for the kernel of the infinites- 
imal evolution operator (valid to third order in e), we have 
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All the terms in the above equation are known explicitly 
and thus can be evaluated numerically, except for the partition 
function Zq. This transfer matrix represents the conditional 
probability for the particle to be at point x' at time t + e given 
that it was at point x at time t . Note that there are two types 
of possible moves: i) either the particle goes from x to x 1 at a 
distance of order y/e, as in usual Monte Carlo, in which case 
both terms in ( |29] > may contribute, or ii) the particle moves 
to x 1 at a large distance from x, in which case the first term 
is negligible and only the second may contribute. This sec- 
ond term thus introduces some non-locality in the stochastic 
sampling process, and allows for large moves which satisfy 
detailed balance. 

The value of Zo is a priori not known. However, the precise 
value of Zq is in fact irrelevant. One obvious way to see it 



is through eq.( 17 1 and ( 19 1 which clearly show that the value 



of Zo can be absorbed in the definition of X. A more explicit 
way to deal with Zo is the following: assume we divide Zo 
by an arbitrary constant A in ( f2"9"| ) and consider the modified 
infinitesimal evolution operator 
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with matrix element 

M(x,x') = 
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where A is an arbitrary constant. 

Again, is the maximal eigenstate of M with eigen- 
value A-o = e~ e?l + (1 — e~^)AZ(). A standard method for sam- 
pling the evolution operator pi) is the diffusion Monte Carlo 
method Q. In this method, an initial population of points, 
representative of the initial guess for the probability distribu- 
tion, is replicated or deleted according to (31 1. The iteration 



of the process guarantees that the population of points will 
asymptotically be distributed according to the ground state 
wave function l^o)- The fact that the largest eigenvalue of 
M is not 1 is corrected by properly rescaling the population 
size to keep it approximately constant. 

The rate of convergence is given by the ratio X\ /Xq of the 
first eigenvalue to the maximal eigenvalue 
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The smaller this ratio, the faster the convergence to the Boltz- 
mann distribution. 

As far as the choice of X is concerned, note that all the 
equations are valid for any value of X and thus one might be 
tempted to use a very large value of X to accelerate conver- 
gence. However, this would favour sampling of distant min- 
ima, at the expense of sampling each minimum locally. It is 
difficult to assess what is the optimal value of X for an efficient 
local and non-local sampling. A small value of X would allow 
the sampling of only local minima, whereas a large value of X 



would sample only distant minima. Using the same strategy 
as in continuous Monte Carlo methods (for fluids for instance) 
0, it seems reasonable to adjust the value of X so that typi- 
cally half of the moves are local and half non-local. 

To illustrate the method, we have tested it on the one- 
dimensional double-well potential 
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In that case, the effective potential V e is a polynomial of 
degree 6, and has 3 minima at low temperature. 

First, it is easy to check numerically that the dependence of 



the spectrum of the transfer matrix (29 1 indeed satisfies equa- 
tions ( |20|21[ ) and that the gap follows eq. ( |22| ). 

We have performed a diffusion Monte Carlo calculation [2|. 
We have fixed the constant A = 1 in eq.pT). We start with 
a population of 20000 points localized in one minimum, say 
x = — 1 . For each x, we draw the next point with a trial proba- 



bility p(x) and replicate it with a weight given by (31 1 divided 
by p(x). Due to the shape of U (x), the barrier to go to x = +1 
is equal to p/2. With T = 0.02, p = 50, this barrier is equal 
to 25. Thus starting with a population of points located in one 
side of the well, the probability to tunnel to the other side is 
very small, of the order of 10~ n . The partition function Zo 
is adjusted so that the population of points remains approx- 
imately constant. In Fig. 1, we show the probability distri- 
bution obtained after 10 4 Monte Carlo steps when X = (no 
accelerating potential). Obviously, the system explores just 




Figure 1: Probability distribution with no acceleration factor after 
10000 timesteps (dashed line) compared to the true Boltzmann dis- 
tribution (solid line). 

one side of the well and cannot overcome the barrier with this 
number of MC steps. 

We have performed identical calculations with X = 5 with 
only 500 total MC steps. As can be seen on Fig. 2, the sys- 
tem is almost perfectly thermalized and the barrier has been 
easily overcome. Therefore, the sampling efficiency has been 
tremendously increased. 

To assess more quantitatively the convergence of our 
method, we define the overlap of the final normalized prob- 
ability distribution with the Boltzmann distribution 
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method to more complex optimization problems and to gener- 
alize it to discrete variable problems. 
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This overlap 8 is related to the distance d between the final 
p.d.f. and the Boltzmann distribution through the identity 

d = 2(1-6) (36) 

An overlap of 1 means identical distributions, and an over- 
lap close to means very different distributions. In Fig. 3, we 
plot the overlap as a function of the number of time steps, for 
X = (solid curve) and for A, = 0.1 (dashed curve). We use a 
small X otherwise the convergence is too fast to be seen on the 
scale of the figure. We see that the non-accelerated case is still 
far from convergence at 5000 steps, whereas the accelerated 
one has converged after about 1000 steps. 
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Figure 2: Probability distribution after 500 timesteps (dashed line), 
with an acceleration factor X = 5 compared to the true Boltzmann 
distribution (solid line). 



with 



Z 1= fdxe-^l 2 



(35) 



Figure 3: Overlap of the probability distribution with the Boltzmann 
distribution as a function of the number of MC steps. The solid curve 
corresponds to X = whereas the dashed one to X = 0.1. 

We have seen how by modifying in a simple way the quan- 
tum Hamiltonian associated to the Brownian dynamics of a 
system, one can obtain a very large acceleration of the conver- 
gence of the probability distribution to the stationary distribu- 
tion. The price to pay is that the new quantum Hamiltonian is 
no more local in space. However, it is still simple enough to 
be sampled efficiently. We are presently trying to apply this 



